Sparse Faraday Rotation Measure Synthesis 



M. Andrecut, J. M. Stil and A. R. Taylor 

Institute for Space Imaging Science, 
Department of Physics and Astronomy, 
University of Calgary, Calgary, Alberta, T2N IN4, Canada 

ABSTRACT 

Faraday rotation measure synthesis is a method for analyzing multichannel 
polarized radio emissions, and it has emerged as an important tool in the study 
of galactic and extra-galactic magnetic fields. The method requires the recov- 
ery of the Faraday dispersion function from measurements restricted to limited 
wavelength ranges, which is an ill-conditioned deconvolution problem. Here, we 
discuss a recovery method, which assumes a sparse approximation of the Faraday 
dispersion function in an over-complete dictionary of functions. We discuss the 
general case, when both thin and thick components are included in the model, 
and we present the implementation of a greedy deconvolution algorithm. We il- 
lustrate the method with several numerical simulations that emphasize the effect 
of the covered range and sampling resolution in the Faraday depth space, and 
the effect of noise on the observed data. 

Subject headings: Methods: data analysis - Techniques: polarimetric - magnetic 
fields 



1. Introduction 



Faraday rotation is a physical phenomenon where the position angle of linearly polarized 
radiation propagating through a magneto-ionic medium is rotated as a function o f frequency. 



The work on astrophisical Faraday rotation has been initiated in (IBurn 



then several important contribut i ons have been added to this t o pic (IGardner Sz Whiteoak 



19661 : ISokoloff et al. I Il998l . Il999l : iKronberg I ll994J : IVallee I Il98d : IWidrow I [2002) . Recently, 



1966). and since 



Faraday rotation measure (RM) synthesis has been re-introduced as an important method 
for analyzing multichannel polarized radio data, where multiple emitting regions are presen t 
along the single line of sight of the observations ( Brentjens &: de Bruyn II2005I : iHeald II2009I ). 
In practice, the method requires the recovery of the Faraday dispersion function from mea- 
surements restricted to limited wavelength ranges, which is an ill-conditioned deconvolution 
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problem, raising important computational difficulties. Since then, three different approaches 
have been proposed to solve this pro blem. A ffist approach uses an adaptati on of the Hogbom 
CLEAN algorithm ( Hogbom \ \m^ to the RM deconvolution Jfleald Ibood l The second ap- 
proach is wa velet-based, and as sumes field symmetrie s in order to project the observed da ta 



onto < (IFrick et al. II2OIOI ). The third approach 



based on the compressed sensing paradigm (IDonoho 



Wiaux et al. ll2009l:lLi et al. Il201lh is 



2006 



Candes k Taoll2006h . All these 



methods are more or less successful in the case of mixed problems, i.e. when both thin and 
thick components are included in the model. For example, in a recent paper it has been 
shown that RM Synthesis may yield an erroneous Faraday structure in the presence of mul- 
tiple, interfering RM cona ponents, even when cleaning of the Faraday spectrum is performed 
( iFarnsworth et al. 11201 ll ). Also, to our knowledge these methods have not been evaluated in 
the presence of noise added to the observed data, a situation that makes the deconvolution 
problem even more difficult. Thus, the development of robust deconvolution methods for 
the recovery of the Faraday dispersion function in a given spectral range becomes crucial for 
the RM synthesis applications. 

Inspired by the above mentioned contributions, in this paper we discuss the case of 
sparse approximation of the complex Faraday dispersion function, i.e. we assume that 
can be approximated by a small number of discrete components, which can be both thin 
or thick. Also, we present the implementation of a greedy deconvolution algorithm, and 
we illustrate the described method with several numerical simulations which emphasize the 
effect of the covered range and sampling resolution in the Faraday depth space, and the 
effect of noise on the observed data. The numerical results show that the described method 
performs quite well for simple component mixtures, at typical sampling resolution values 
and coverage range in the Faraday depth space, and it is quite robust in the presence of 
noise. We show that the described technique is well suited for exploratory data analysis, 
where prior information about the component distributions is not available, and it can be 
used as a complement to the previously proposed methods. 

Although a sparse solution is an idealized model of a complex astrophysical system, the 
potential complexity of the solutions is adequate for a wide range of astrophysical situations. 
The sparseness requirement steers the solution to include the smallest number of components 
required to fit an observed Faraday depth spectrum. Double-lobed radio galaxies that are 
not resolved by the telescope may experience different Faraday rotation in each lobe because 
the differences in the foreground on scales smaller than the beam. The lobes themselves 
may be extended and experience differential Faraday rotation as well. A sparse solution may 
consist of two discrete Faraday components representing each lobe. If the data are good 
enough to detect differential Faraday rotation across the source, the solution may include 
one or more components with a finite extent in Faraday depth. Complex source structure 
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may be built up out of a dictionary of basic thin and thick Faraday components, subject to 
the requirement that the solution remains sparse. 

In the diffuse interstellar medium, a case where Faraday rotation of Galactic synchrotron 
emission is dominated by a single HIT region along the line of sight is an example of a system 
that is well approxi mated with two c o mpon ents i n Faraday depth , e.g. t he circular Faraday 



screen discussed by iHaverkorn et al.l (l2003l ) and iDe Bruyn et al.l (l2009l ) . As in the case of 
double lobed radio sources, the sparse solution is not limited by two delta functions in 
Faraday depth, as it can increase in complexity if warranted by the data. 

The assumption of sparseness may fail in case there is a power on a large range of Faraday 
depths, defined by the minimum and maximum Faraday depth detectable in a survey. This 
may occur in some supernova remnants with complex structure and strong magnetic fields. 



2. Rotation measure synthesis 

In this section we give a b rief description of the Faraday R M synthesis problem, following 



the formulation introduced in iBrentjens &: de Bruyn I (120051 ) 



The Faraday rotation is characterized by the Faraday depth (in radm ^), which is 
defined as: 

/•observer 

(t){r) = 0.81 / n^B ■ dr, (1) 

J source 

where ng is the electron density (in cm~^) , B is the magnetic field (in /iG), and dr is the 
infinitesimal path length (in parsecs). We also define the complex polarization as: 

P(A2) = g(A2) + iU{X^) = ple^'^^^'\ (2) 

where p is the fractional polarization, /, Q, U are the Stokes parameters, and x{^^) is the 
polarization angle observed at wavelength A: 

x(A) = -arctan^. (3) 

The Faraday RM is defined as the derivative of the polarization angle with respect to 

A^: 

RM{X') = (4) 

We now identify RM with the Faraday depth 0, and we assume that the observed 
polarization -P(A^) originates from the emission at all possible values of 0, such that: 

r+co 

P{X') = / F(0)e2^*^'rf0, (5) 
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where F{(f)) is the complex Faraday dispersion function (the intrinsic polarized flux, as a 
function of the Faraday depth). Thus, in principle F((f)) is the inverse Fourier transform of 
the observed quantity -P(A^): 



/+00 
P{X^)e-^'^^'dX^. (6) 
-oo 

However, this operation is ill-defined since we cannot observe -P(A^) for < 0, and also in 
practice the observations are limited to an interval [X^in, Xf^^xl- 

In order to deal with the above limitations, the observed polarization is defined as: 

P(A2) = iy(A')P(A'), (7) 

where W is the observation window function, with W{X'^) > for A^ G [A^j„,A^Q^], and 
W{X'^) = otherwise. Therefore, we obtain the reconstructed dispersion function: 

/+00 
p^X^^e-^i4>>^'dX'', (8) 
-oo 



where 



W{X')dX' 



(9) 



is the normalization constant for the observation window. The reconstructed dispersion 
function can also be written as: 

F(0) = i?(0)oF(0), (10) 
where o is the convolution operator, and 

/ + 00 
W{X^)e-^'^^'dX^, (11) 
-oo 

is the RM spread function (RMSF). 

Using the shift theorem, we can also write: 

/ + 00 
p^_^2^g-2#(A2-A2)^^2^ (12) 
-oo 

/ + 00 
-oo 

where A^ is the mean of the sampled values in [A^j„, A: 



and 



^ 1 

maxi 
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The goal of the analysis is to find F^cj)) from the observed values -P(A,^) = P„ (i.e. Q„ 
and Un) over N discrete channels G [A^j„, A^^^], 'n. = 0,l,...,A^ — 1, with the given weights 
(A^) = Wn- Since the measured values are discrete (each value constitutes an integral over 
the channel centered at A^), we should consider the discrete versions of the above equations, 
i.e.: 

N-l 

F{(f)) -aJ2 Pne~^'^^^"-~^'^ , (14) 

n=0 

and respectively 

N-l 

R{(j)) ^AJ2 Wne-^'^^^'-~^'\ (15) 

n=0 

The reconstructed function F{(j)) depends on the window W{\^), which acts as a filter, and 
improves substantially by increasing its coverage in the A^ space. Obviously, i^(0) is a "dirty" 
reconstruction of F(0), i.e. the convolution of -F(0) with -R(0), and and a deconvolution step 
is necessary to recover F{(j)). 



3.1. 



Sparse approximation 
Discrete representation 



In general, the number of data points is limited by the number of independent measure- 
ment channels, and therefore there are many different potential Faraday dispersion functions 



consistent with the measurements ( Burn 



Frick et al. 


2010; 


Li et al. 


2011: 



1966 



Brentiens fc de Bruvn I l2005l : iHeald 1 12009 



such ambiguities, is to impose some extra constraints on the Faraday dispersion function. 
Our approach is based on th e recently introduced framework of compressive sensing (IDonoho 



LOT 

20061 : ICandes fc Tao II2006I ). Compressive sensing relies on the observation that many types 
of signals can be well-approximated by a sparse expansion in terms of a suitable basis, or 
dictionary of functions. The main idea of compressive sensing is that if the signal is sparse, 
then a small number of measurements contain sufficient information for its approximate or 
exact recovery. In our case, the problem is to reconstruct a sparse F{(j)) from a relatively 
small number of -P(A^) measurements. Therefore, we assume that the model of F{(j)) is 
sparse in an over-complete dictionary of functions. By over-complete we understand that 
the number of functions in the dictionary is larger than the number of independent observa- 
tion channels. Thus, the dictionary functions may be redundant (linearly dependent), and 
therefore non-orthogonal. In order to give a proper formulation of this approach we need to 
introduce a discrete representation of the space. 
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It is known (IBrentjens fc de Bruyn I l2005l ) that, for a discrete sampled Faraday dis- 
persion function, the full width at half maximum of the main peak of the RMSF is given 
by: 



(16) 



where AA^ is the width of the observation interval. Also, using a uniform grid in space 
one can estimate the maximum observable Faraday depth by: 



73 
(5A2' 



(17) 



where ^A^ = AX^/N is the width of an observing channel (IBrentjens fc de Bruyn 1120051 ). 
This estimation of (pmax is only an approximation, since in reality only the frequency u is 
sampled linearly. Therefore, in our discrete representation we consider a nonlinear grid in 
the A^ space: A^ = /v^i where z/„ = {ymax — ^minjl^ is the centered frequency of the 
channel n = 0,l,...,iV — 1, and c is the speed of light. Also, we consider a linear grid in the 
(/) space, where the computational window the sampling resolution and the number 
of points M are set to: 



M 



where \_x\ is the integer part of x. 

The model of F{(j)) is therefore characterized by a uniform grid, (pm = —4'win + ^'Pri 
m = 0, 1, M — 1, and a vector z = [zq, Zi, ...,zm-i] ^ C^^, which is assumed sparse, i.e. it 
has a small number of non-zero components, corresponding to the complex amplitudes of the 
sources located on the (pm grid. For example, a thin source with the amplitude Zm, located 
at (prn , will be approximated by the product of z^ with a Dirac function 5(0 — 4>m), while a 
thick source will be characterized by a contiguous set of non-zero amplitudes in the vector z, 
which requires a different set of adaptive functions, capable of capturing their position and 
extensive support in the (p space. The goal of the analysis is to find the vector z, which is a 
discrete approximation of the Faraday dispersion function F[(j)), from the measurements Qn 
and Un, n = 0,1, N — 1. 



3.2. Dirac approximation 



Since, in general we can have M > N, the Dirac functions 5(0 — 0m), m = 0,1, M — 1, 
form an over-complete dictionary in the space. The decomposition of -F(0) with respect 
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to the Dirac over-complete dictionary is: 

M-l 



m=0 



F{(f))=Y,^mS{(p-<l)m). (19) 

m 

From the equations (5) and (7) we obtain: 

°° m=0 m=0 



M-l 



(20) 



We observe that the transformation of F{(j)) into -P(A^) can be written in a matrix form as 
following: 

VV^Hz = p, (21) 

where 

p=[Po,Pu-,PN-if eC^, (22) 

is the A^-dimensional complex vector of observations, and ^ G C^^^^ is the N x M matrix 
with the Fourier terms: 

^n,m = e''^-'", (23) 

and W is the N x N diagonal matrix, with the diagonal elements equal with the channel 
weights: Wn,n = W^. 

If we are searching for the sparsest solution possible, then the io norm of z: 



Io = ^ h{z„,), (24) 



m=0 

should be minimized. This sparseness assumption leads to the following optimization prob- 
lem: 

min II^IIq subject to W'^z=p. (26) 
However, finding the minimum ^Q norm is an NP-complete problem, which re quires a combi- 



natorial search of the parameter space and therefore is practically unfeasible (jPonoho 112006 



Candes fc Tao II2006I ). A better approach is to replace the norm with the ^l norm: 



M-l 



li=$^kfc|, (27) 



ni=0 



- 8 - 



which transforms the combinatorial prob 



polynomial time ( iBoyd &: Vandenberghe 



em in to a convex problem, that can be solved in 



20041). and it has been shown to give solutions 



close to the Iq norm solutions (jChen et al. Il200ll ). Thus, the problem can be reformulated 
as finding the vector z such that: 

min||2;||j^ subjectto W'^z = p. (28) 

One can see that we do not make any assumption on the number of non-zero components, 
we just assume that their number is smaller than M. 

So far we have not considered the influence of noise on the observed data. We assume 
a complex noise vector t] G C^, with the components rin & C having the real and respective 
imaginary parts sampled from a normal distribution with zero mean and standard deviation 
a: R.e{rin}, Im{?7„} G A^(0, cr). Thus, the transformation of F{(j)) into -P(A^) can be rewritten 
as: 

W^z + r] = P, (29) 
and the minimization problem can be reformulated as: 

minll^ll^ subjectto WlV^z - p\\l < {/3(r)'^. (30) 

z 

The use of the £i norm induces sparsity in z, while the constraint ensures W'^z ^ p. Since 
p is observed in the presence of noise, it is reasonable to not enforce W'^z = p exactly, and 
to stop the minimization process when the norm of the residual becomes comparable with 
the standard deviation of the noise {/3 ~ VN). 



3.3. Generalization 



Dirac functions can be used to approximate thin sources only. In orderDonoho2006 
to approximate thick sources we extend the dictionary by incorporating a set of functions, 
characterized by adaptive translation and scaling properties, such that they are capable to 
capture the position and the extent of thick sources in the space. Thus, we assume that 
-F(0) has a s parse approximation in an over-complete dictionary $ of functions v^j(0) G 
called atoms dMallat fc Zhang lllQQsl ): 



(31) 



Here, J > M is the number of atoms in the dictionary, and only a small number of the 
complex coefficients are assumed to be non-zero. Thus, by introducing the M x J complex 
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matrix $ G C^^^'^, with the elements ^m,j = '^j{4>m), and the sparse complex vector C, = 
[^0)^1) ^ C"^, and taking into account that: 

n = (32) 

we obtain the following minimization problem: 

m^in ||,^||-^ subject to ||r^ — p||2 < (/3cr)^, (33) 

where 

r = (34) 

is a X J complex matrix. In this more general case, the goal is to find the sparse vector 
^ G C"^ in the dictionary space. Obviously, when the dictionary is reduced to the Dirac basis 
we have J = M, ^ = z and $ = /, where / is the M x M identity matrix, and therefore F 
reduces to the weighted Fourier matrix, F = W'^. 



3.4. Over-complete dictionaries 

We should note that an over-complete dictionary $ that leads to sparse representations 
can be chosen as a pre-specified set of analysis functions (wavelets, Gaussian packets, Ga- 
bor functions etc.), o r designed by modeling its content to a given set of signal examples 



(ICandes &: Tao II2006I : iMallat fc Zhang Ill993l ). The success of such dictionaries in applica- 



tions depends on how suitable they are to sparsely describe the signals in question. A general 
family of analysis functions can be obtained by scaling and translating a single normalized 
window function </:>(</)), with ||<^||2 = 1- Therefore, for any scale a > and translation 6 G M 
we define the atom (pj of the dictionary $ as following: 

(^^•(0) = y?j(„,fc)(0) = ( ^— ) . (35) 



Therefore, the index j of the atom function depends on both a and b parameters: j = j{a, b). 
Thus, in order to represent F{(f)) in the dictionary $, we need to select an appropriate 
countable subset of atoms (pj, j = 0, 1, J — 1, such that F{(j)) can be written as a linear 
expansion. Depending on the choice of the atoms (pj, the expansion coefficients will give 
explicit information about the behavior of F{(f)). For example, we should note here that 
different wavelet transforms correspond to different families of atoms. In our definition, 
we do not limit the dictionary to a single wavelet basis, on contrary we consider an over- 
complete set, which also may contain different concatenated families (sub-dictionaries) of 
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such analysis functions. In order to illustrate numerically this approach, let us consider the 
boxcar dictionary, defined as: 

/A\ / if b<(l)<b + a 

= 1 otherwise " ^'^^ 

An important characteristic of the boxcar dictionary is that it can capture sources with arbi- 
trary thickness. Another advantage is its easy discretization. In our case, the discretization 
grid has M points (pm with the sampling resolution (pR. Thus, assuming that the maximum 
width of a boxcar atom is a^ax = ScpR, where S < \_M/2\, then for each scale a — scpR, 
s = 1, 2, and translation h — IcpR, 1 — 0,1, M — s we can define a boxcar function 
with the index j — j{s, I), such that: 

/ , X _ / , X / if l<m<l + s 

M<l>m) = ^Jis,l0m) - [ ^^^^^^^^^ ■ (37) 

Therefore, one can define maximum J = SM — S{S + l)/2 boxcar functions on such a 
grid, and we can easily build a discrete dictionary matrix $ of size M x J. In this paper 
we limit our discussion to the boxcar dictionary defined above, since it is simple enough to 
illustrate the approach, and to provide meaningful results. Also, this dictionary includes by 
construction the Dirac set of functions, which in this case are the first M functions with 
s = 1. A similar approach can be used to build sub-dictionaries corresponding to other 
families of analysis functions. 

3.5. Multi-scale analysis 

The sparse decomposition can also be used to perform a multi-scale analysis, by con- 
sidering all the dictionaries $5, where S = 1,2, .... Smax ^ lM/2\. Also, let us assume that 
= ^sC is the solution obtained for the scale S, i.e. the recovered discrete representation 
of F{(f)) with the dictionary $5. We consider a S^ax x M matrix S, where each fine with 
the index S corresponds to the solution obtained for the scale 5", i.e. Es = zs — $5^. 
Obviously, the solution zs will depend on the maximum scale S used in each dictionary $5, 
and by visualizing the matrix S, we obtain a representation of the behavior of the solution 
at different scales. 
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4. Matching pursuit 



The sparse optimizati on problem, d efined in the previous section, is known as Basis 
Pursuit Denoising (BPD) (jPonoho II2006I ). and if written in a Lagrangian form: 

"1 



mm 



lire 



Pll? + a||elli 



(38) 



it can be thought of as a least squares problem with an ii regularizer, where a > is a pa- 
rameter that controls the trade-off between sparsity and reconstruction fidelity. Thus, BPD 
solves a regularization problem with a trade-off between having a small residual and making 
the solution simple in the ii sense. The solutions of BPD are often the b est computationally 
tractable approximation of the under-determined system of equations (IDonoho fc Tanner 
20051 ). In our case, since the direct space and the inverse Fourier space are perfectly incoher- 
ent, the problem can be solved using linear programming techniques whose computational 
complexities are polynomial. However, for the sparse RM approximation problem, the BPD 
approach requires the solution of a very large convex, non-quadratic optimization problem, 
and therefore suffers from high computational complexity. Due to the complexity of the 
linear programming appr oach, several o t her £i optimization m ethods have been proposed to 



solve the BPD problem (IDonoho II2006I : ICandes &: Tao II2006I ). Here, we consider a method 
based on sub-optimal greedy algorithms, which requires far less computation. Our goal 
is not only to obtain a good sparse expansion, but also to provide a fast computational 
metho d, therefore here we foc us our attention on the greedy Matching Pursuit (MP) algo- 
rithm (|Mallat fc Zhang 1119931 ). which is the fastest known algorithm for the BPD problem 
( IChen et al. Il200ll ). MP has many applications in signal and image coding, shape repre- 
sentation and recognition, data compression etc. One of its main features is that it can be 
applied to arbitrary dictionaries. 

Starting from an initial approximation ,^(0) = and residual r(0) = p, the algorithm 
uses an iterative greedy strategy to pick the column vectors P*^-'-' which best reduce the 
residual. At every time step t the current residual r{t) can be decomposed as following: 

r{t) = (r(t),r(^')) ||r(^')||;'r(^') + r(t + l), (39) 

where r(t + 1) is the future residual, and (., .) is the standard inner product operator in the 
complex Hilbert space. Since r{t + 1) and P^-'-' are orthogonal, (^r{t + 1), P^-'^) = 0, we have: 

-|(r(t),p(^-)>f ||P(^-)||;'. 



\r{t+l)\\; = \\r 



(40) 



In order to minimize the norm of the future residual, the algorithm should choose the column 
vector P^-'^ which maximizes the projection on the current residual: 

-1" 



arg max 

j 



(r(t),p(-''))| llP^-'')] 



(41) 
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Therefore, after choosing the best column F*^^*^ one can update the solution and the residual 
as following: 

^{t + 1) = ^{t) + cT^''\ (42) 
r{t + 1) = r{t) - cr^'''\ (43) 

where 

c= {r{t),T^^'^)\\T^'''^\\~\ (44) 

Thus, after t iteration steps the resulted solution is a sparse vector ^ with the non-zero 
coefficients C,kf The algorithm stops when the maximum number of iterations has been 
reached (which usually is set to J), or when the norm of the residual becomes comparable 
with the standard deviation of the noise. The reconstruction of the target signals is then 
given by: 

J-1 

z = J2^j^'^'^ = H, (45) 
i=o 

J-1 

P = E^^r^'^ = re (46) 

j=0 

The pseudo-code of the RM-MP algorithm is listed in the Appendix. 



5. Numerical results 



5.1. Two different experiment layouts 



In order to illustrate the described deconvolution method, we have considered two 
different experiment configurations, corresponding to two different ranges of observed fre- 
quencies. The first one is consistent with the observations with the Westerbork Synthesis 
Radio Telescope (WSRT) in the frequency range 315 MHz to 375 MHz, as described in 
(IBrentjens &: de Bruyn II2005I ). The second one is consistent with the observations with the 
Arecibo telescope in the frequency range 1225 MH z to 1525 MHz, for Th e Galactic ALFA 
Continuum Survey (GALFACTS), as described in (ITaylor &: Salter Il2010l ). The separation 
between the frequency windows is roughly 1 GHz, and therefore the maximum observable 
Faraday depth and the half maximum of the main peak of the RMSF are quite different. 
Here we will show that the RM-MP method provides very good results in both cases. 

As a testbed for numerical simulations, we have considered a mixed scenario consisting of 
three components with different widths, such that the simulation results provide the response 
of the algorithm to a full range of component widths. The first one is a thin component 
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given by: F{—0.5(f)win) = 9 — 8?. The second is a thick component given by: F{(f)) = —7 + 8i 
if — O.O20^j„ < < 0.02(j)y^in, and F{(j)) = otherwise. The third is a thicker component 
defined by: F{(j)) = 8 — 6z if 0.46</)^j„ < < 0.540^4^, and = otherwise. Thus, this 
scenario can be easily scaled for different computational windows [—(j)win, 4>win], where 
is given in radm~^. Also, we have considered that all the observational channels are equally 
weighted: i.e. Wn = I, n = 0, 1, N, and A = l/N. 



5.2. WSRT 



The various parameters associated with the WSRT experiment layout ( iBrentjens &: de Bruyn 



20051 ) are listed bellow: 



Frequency range: Umin = 315 MHz, Umax = 375 MHz; 

Wave length range: A^^^ = 0.639 m^, A^„^ = 0.905 m^, AA^ = 0.266 m^; 

Number of channels: = 126; 

Half maximum of the main peak of the RMSF: 6(f) = 12.990 radm^^; 
Maximum observable Faraday depth: (pmax = 818.414 radm~^; 

Let us first consider the ideal noiseless case, when the sampling resolution in the 
space is equal with the half maximum of the main peak of the RMSF, (pn = 6(j), and the 
computational window is = (pmax- In this particular shown in Figure 1, the RM- 

MP algorithm provides an exact solution, since N = M = 126 and therefore no information 
is lost in the measurement. One can also notice that in this noiseless exact sampling case, 
the solution is independent of the scale used in the dictionary, as it can be seen on the 
multi-scale representation for < 5 < 25. However, the problem becomes ill-defined in the 
following situations: the noise is present; the sampling resolution becomes finer than the 
half maximum of the main peak of the RMSF, (pn < 6(f); and the number of independent 
observed channels is smaller than the number of points in the space, N < M. In this case 
the system becomes under-determined, and therefore some information is lost. In order to 
exemplify this situation, we consider a scenario in which all these factors are present. We 
add noise with the standard deviation a = \/N = 11.22, to the Q and U values. We limit 
the computational window to (pwin = 126radm^^ < (pmax, and we increase the number of 
points on the grid to M = 252, which is double of the number of observation channels 
N = 126. This results in a sampling resolution of 0/? = Iradm"^ ^ 50. The obtained 
results (for /3 = \/2N, S = 25) are shown in Figure 2. One can see that the phase of some 
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components cannot be reliably recovered anymore, since there is not enough information in 
the signal to be detected properly. We should note that the problem is correctly resolved in 
the noiseless case (not shown here). Thus, the effect of noise addition consists in a partial 
loss of information about the phase of F{(j)), which is expected, since the number of solutions 
compatible with the data increases dramatically with the added noise. Also, we should point 
out that the solution improves by increasing the signal-to-noise ratio, as shown in Figure 3, 
where we have increased the amplitude of the components by a factor of 1.5, keeping their 
phase unchanged. One can see that in this case, the RM-MP method resolves correctly all 
the components. This result suggests that an adequate signal to noise ratio should be taken 
into account, in order for the method to be successful. 



5.3. Arecibo 

The GALFACTS survey, carried out with the Arecibo telescope, has the following pa- 
rameters: 

Frequency range: Umin = 1225 MHz, Umax = 1525 MHz; 

Wave length range: A^^„ = 0.0386 m^, A^„^ = 0.0598 m^, AA^ = 0.0212 m^; 

Half maximum of the main peak of the RMSF: Sep — 163.044 radm~^; 

The maximum observable Faraday depth, (pmax, is inverse proportional with the width 
of the observation channel 5X, and by increasing the number of channels, the maximum 
observable Faraday depth becomes unreasonable high. Therefore, in order to obtain some 
meaningful results, we have to limit both the number of observation channels in the A^ space, 
and the computational window in the space. 

First we consider that the compuational window is limited to 0^j„ = 1800radm~^ and 
the number of observation channels is N — 200. Also, we consider the same testbed as for 
WSRT case, and we add noise with the standard deviation a — y/N — 14.14. In addition, we 
increase the number of points on the grid to M = 300, and therefore we obtain a sampling 
resolution: 0/? = 12radm"^ <^ 6(f). The obtained results (for /S = \/2iV, S = 25) are shown 
in Figure 4. One can see that all the components are relatively well resolved, with a small 
error in the phase, but with almost exact amplitudes. In the next experiment we zoom 
more in the (p space, and we impose a computational window of (p^in = 900radm~^, keeping 
the same number of observation channels and number of points on the (p grid, such that 
(pR — 6radm~"^. The results are again reasonable good, as shown in Figure 5, with a small 
variation of the phase due to the uncertainty introduced by the noise addition. However, if we 
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zoom further in the (f) space the solution is not as good anymore, as it can be seen in Figure 
6. In this case we have a much finer samphng resolution (p^ = 3radm~^, corresponding to 
a computational window of (pwin = 600radm~^, number of observation channels N = 400, 
and the number of points on the grid M = 400. This is a consequence of the fact that by 
increasing A^, we have increased also the standard deviation of the noise to a = \/iV = 20, 
such that the signal to noise ratio is smaller than before. Again, an improved solution can 
be obtained by increasing the amplitude of the components, such that the signal to noise 
ratio is higher. 

5.4. Beyond the RMSF resolution 

In the previous numerical experiments we have shown that the RM-MP algorithm is able 
to resolve correctly the components from the input F{(j)) model, if the separation between 
the components is higher than the half maximum of the main peak of the RMSF. In order to 
estimate the response of the RM-MP algorithm at resolutions beyond the RMSF hmit, we 
consider two Dirac components, F(0 — A0j„/2) —9 — 7i, and respectively F(0-|- A0j„/2) = 
—9 + 7i, separated by A0j„ < 5(f), where S(f) is the half maximum of the main peak of the 
RMSF. The numerical experiments show that the RM-MP algorithm cannot resolve correctly 
the two components, but returns a boxcar function centered at the exact position value 0, 
with a width equal with the separation between the two components. In order to illustrate 
this result we consider the WSRT scenario, with = (pmax, N = 126 and M = AN = 1008, 
which gives a sampling resolution (pR = 0.812 radm~^, in the space. In Figure 8 we give 
the width of the output boxcar function A0o„t as a function of the input separation A0j„. 
One can see that for all performed experiments we have ^(t>out — ^<Pin- Also, in Figure 9 
we show a typical example, where the input separation is A0j„ = — 4.06radm~^, or 
approximatively 30% from S(f). Thus, even at resolutions beyond the half maximum of the 
main peak of the RMSF, the RM-MP algorithm provides some useful information, i.e. the 
position and the separation width of the two components. 

5.5. Discussion 

The above numerical experiments have shown that the sparse RM-MP method works 
well for relatively simple sparse problems. We should note that the method can be used to 
recover more complex dispersion functions. For example, let us consider the situation from 
Figure 7, where we have two thin components and two thick components. The first thick com- 
ponent is modeled as a Gaussian, while the second is modeled as a boxcar function. Also, we 



- 16 - 



assume the noisy WSRT experiment configuration, with: a = yN ^ (t>win = 818.414 radm~^, 
M = 220 and 0r = 7.440 rad m~^ < 6(j). One can see that all the sources are almost exactly 
recovered, including the thick Gaussian, even though the dictionary does not contain any 
Gaussian functions. In fact, the shape of the Gaussian is reconstructed from several boxcar 
functions from the dictionary. Thus, the boxcar dictionary can be used to recover more 
complex functions. However, the success of the method depends on another aspect which 
has not yet been discussed. More specifically, the performance of the RM-MP method de- 
pends on the number of observation channels N, the number of points M on the grid, and 
the number K of non-zero components in the discrete representation of the Faraday depth 
function F{(j)). An important question here is t hat giveii N and M, what is the ma ximum 
value of K, for a faithful recovery of F{(j))7 In (jPonoho I l2006l : ICandes &: Tao II2006I ) it has 
been shown that any i^'-sparse signals of length M, with K <^ M, can be recovered from 
only N > cK < M random measurements (projections), where c ~ log{M/K). The answer 
to this question is not obvious for the sparse RM synthesis problem, since the reconstruction 
process will depend on experiment layout, i.e. the observed frequency band and the half 
maximum of the main peak of the RMSF. This is an important theoretical question which 
we would like to address in the future development, in order to improve the performance of 
the method. 



6. Conclusions 



The recently introduced Faraday RM synthesis is becoming an important tool for an- 
alyzing multichannel polarized radio data, and derive properties of astrophysical magnetic 
fields. The method requires the solution of an ill-conditioned deconvolution problem, in 
order to recover the intrinsic Faraday dispersion function, and therefore the development of 
robust methods has become crucial for the RM Synthesis applications. Here, we have as- 
sumed that the complex Faraday dispersion function F{(f)) can be approximated by a small 
number of discrete components from an over-complete dictionary, and we have developed a 
greedy algorithm to solve the deconvolution problem. The method uses an over-complete 
dictionary of functions which can be efficiently used in a multi-scaling context, and it can 
easily include different types of analysis functions. We also have presented several numerical 
simulations showing the effect of the covered range and sampling resolution in the Faraday 
depth space, and the effect of noise on the observed data. The numerical results show that 
the described method performs well at common resolution values and coverage range in the 
Faraday depth space, and it is quite robust in the presence of noise. Therefore, the described 
technique is well suited for exploratory data analysis, and it can be used as a complement 
to the previously proposed methods. 
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A. Appendix material 

The pseudo-code of the RM-MP algorithm: 

^ •<— 0; solution vector ( J-dimensional) 
r initial residual (A^- dimensional) 
r •<— H^^<l>; systems matrix (A^ x J-dimensional) 
cr; standard deviation of the noise 
/3; stopping (regularization) parameter 
tjnax', maximum number of iterations, 
c •<— 0; the projection coefficient 
Cmax ^ 0; the selected projection coefficient 
k; the index of the selected column 
for(i = 0, 1, ...,tmax - 1) 
{ 

for(j = 0,l,...,J-l) 
{ 

c^(r,rw>||rw||;^ 

if(|c| > \Cmax\) 
{ 

(^max ^ 

} 

} 

C Cjjiax 1 1 r ^ 1 1 2 ) 

Cfe Cfe + c; 



if(||r||2 < {P<yf) then break; 
} 

return ^; 



- 19 - 



REFERENCES 

Brentjens, M. & de Bruyn, A. 2005, A&A, 441, 1217. 

Boyd, S. & Vandenberghe, L. 2004, Convex Optimization, Cambridge University Press. 
Burn, B. J. 1966, MNRAS, 133, 67. 

Candes, E., Tao, T. 2006, IEEE Trans. Inf. Theory, V52, 5406. 
Chen, S., Donoho, D., Saunders, M., 2001, SIAM Rev., 43, 129. 

De Bruyn, A. G., Bernardi, G. & The LOFAR Team, in The Low-Frequency Radio Universe, 
proceedings of the conference held 8-12 December 2008, at National Centre for Radio 
Astrophysics (NCRA), TIFR, Pune, India, ed. D. J. Saikia, D. A. Green, Y. Gupta 
& T. Venturi, ASP Conf. 407, 3. 

Donoho, D., Tanner, J. 2005, PNAS, 102(27), 9446. 

Donoho, D. 2006, IEEE Trans. Inf. Theory, V52, 1289. 

Farnsworth, D., Rudnick, L., Brown, S. 2011, AJ, V141(6), 191. 

Prick, P., Sokoloff, D., Stepanov, R., and Beck, R. 2010, MNRASrLett., 401, L24. 

Gardner, F. F. & Whiteoak, J. B. 1966, ARA&A, 4, 245. 

Haverkorn, M., Katgert, P., & de Bruyn, A. G. 2003, A&A, 404, 233. 

Heald, G. 2009, Cosmic Magnetic Fields: from Planets, to Stars and Galaxies, 259, 591. 

Hogbom, J. 1974, A&A Supp., 15, 417. 

F. Li, S. Brown, T. J. Cornwell and F. de Hoog, 2011, A&A, 531, A126. 
Kronberg, P. P. 1994, Rep. Prog. Phys., 57, 325. 
Mallat, S., Zhang, Z. 1993, IEEE Trans. Signal Process., V41, 3397. 
Sokoloff, D. D., Bykov, A. A., Shukurov, A., et al. 1998, MNRAS, 299, 189. 
Sokoloff, D. D., Bykov, A. A., Shukurov, A., et al. 1999, MNRAS, 303, 207. 



-20- 



Taylor, A. R. & Salter, C. J. 2010, The Dynamic Interstellar Medium: A Celebration of the 

Canadian Galactic Plane Survey. Proceedings of a conference held at the Naramata 
Centre, Naramata, British Columbia, Canada on 6-10 June 2010. Edited by R. Kothes, 
T. L. Landecker, and A. G. Willis. San Francisco: Astronomical Society of the Pacific, 
2010, p.402. 

Vallee, J. P. 1980, A&A, 86, 251. 

Wiaux, Y., Jacques, L., Puy G., Scaife, A., & Vandergheynst, P. 2009, MNRAS, 395, 1733. 
Widrow, L. M. 2002, Rev. Mod. Phys., 74, 775. 



This preprint was prepared with the A AS macros v5.2. 



- 21 - 




Fig. 1. — WSRT experiment layout, noiseless exact sampling case: M = N — 126 and 
(l>R — = 12.990 radm~^. The figure is bottom-up organized: the bottom row is the 
measured data, i.e. (^(A^), f/(A^), and -P(A^); the second row is the input (original) model 
of F{(f)); the third row is the dirty F{(f)); the forth row is the RM-MP algorithm recovered 
and the fifth row is the multi-scale representation of the solution (see text for details). 
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Fig. 2. — WSRT experiment layout, noisy sampling case (cr = sfN): N = 126, M = 252 
and 4>ji = 1 rad m~^ <C 50. 
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Fig. 4. — Arecibo experiment layout, noisy sampling case (cr — sfN): N — 200, M — 300 
and (pR = 12radm~^ <C d4>. 
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Fig. 5. — Arecibo experiment layout, noisy sampling case (cr — sfN): N — 200, M — 300 
and (pR = 6radm~^ <C 54>. 
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7. — WSRT experiment layout, noisy sampling case (cr = y/N)'- N 
bn = 7.440 rad m-2 = 0.5750. 
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Fig. 8. — The width of the boxcar function response A^o^^ as a function of the separation 
width A0j„ between two Dirac components. 
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Fig. 9. — A typical response of the RM-MP algorithm for two Dirac components separated 
by A0j„ = 5(f)R = 4.06radm^^ < 6(f) = 12.99radm^^. WSRT experiment layout, noiseless 
sampling case: N = 126, M = 1008 and = 0.812 rad m-^. 



